-
Notifications
You must be signed in to change notification settings - Fork 822
feat: get block ids with load errors #5668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add infrastructure to track block IDs that fail during async KV cache loading, enabling vLLM to identify and handle failed blocks gracefully. Changes: - protocol.rs: Add block_ids field to WorkerTransferRequest - scheduler.rs: Update tests with new block_ids field - slot.rs: Populate block_ids for onboard/offload operations - worker.rs: Add failed_block_ids tracking and get_block_ids_with_load_errors method - connector_worker.py: Expose get_block_ids_with_load_errors to Python - dynamo_connector.py: Implement KVConnectorBase_V1 interface method Signed-off-by: Patrick Riel <priel@nvidia.com>
…acking - Add failure_tx/failure_rx channel between Scheduler and WorkerSchedulerClient - Add drain_failures() method to collect pending failure notifications - Track request_id -> uuid -> block_ids mapping in KvConnectorWorker - Populate failed_block_ids from scheduler failure notifications - Clean up block_ids tracking when requests complete Signed-off-by: Patrick Riel <priel@nvidia.com>
WalkthroughAdds end-to-end propagation and retrieval of block load failure IDs: transfer requests carry Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@lib/bindings/kvbm/src/block_manager/vllm/connector/worker.rs`:
- Around line 293-305: When enqueuing onboarding_operations we currently stash
request->blocks in request_to_blocks but later remove that mapping in
get_finished before we call get_block_ids_with_load_errors, which can drop the
failure->block correlation; fix by ensuring failures are drained/translated into
block IDs before cleanup: call get_block_ids_with_load_errors (or a shared
helper that drains scheduler failures and maps them to block IDs using
request_to_blocks) prior to removing the request's entry from request_to_blocks
in get_finished; factor the draining/mapping logic into a reusable helper (e.g.,
drain_and_map_load_failures) and use it from both the onboarding loop code paths
and the get_finished cleanup to preserve failure→block mappings.
🧹 Nitpick comments (1)
lib/llm/src/block_manager/connector/protocol.rs (1)
152-154: Confirm whether#[serde(default)]should mask missingblock_ids.
Defaulting to empty accepts older senders but silently drops error tracking. If deployments are atomic and you want strictness, consider making the field required so deserialization fails fast when it’s absent.Based on learnings, keep the spec explicit if backward compatibility isn't required.💡 Possible tweak to keep the protocol explicit
- #[serde(default)] - pub block_ids: Vec<usize>, + pub block_ids: Vec<usize>,
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Signed-off-by: Patrick Riel <priel@nvidia.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@lib/bindings/kvbm/src/block_manager/vllm/connector/worker.rs`:
- Line 127: Remove the trailing whitespace on the affected line in
lib/bindings/kvbm/src/block_manager/vllm/connector/worker.rs (the line flagged
by CI around the worker module) so the file has no extraneous spaces at
end-of-line; you can delete the trailing space directly or run a whitespace
cleanup (editor trim trailing whitespace or rustfmt/trim tool) and re-run the
pre-commit hook to verify the CI error is resolved.
🧹 Nitpick comments (2)
lib/bindings/kvbm/src/block_manager/vllm/connector/worker.rs (2)
128-129: Track the TODO with an issue (or handle in this PR).Leaving the “move this out of bindings” TODO without tracking risks it lingering.
Do you want me to propose a follow-up issue or a concrete refactor plan?
136-137: Avoid silent truncation when casting block IDs tou32.If
block_idcan exceedu32::MAX,as u32will truncate. Considertry_from(or an assertion) to make this safe.♻️ Safer conversion
- for &block_id in block_ids { - self.failed_block_ids.insert(block_id as u32); - } + for &block_id in block_ids { + match u32::try_from(block_id) { + Ok(block_id_u32) => { + self.failed_block_ids.insert(block_id_u32); + } + Err(_) => { + tracing::warn!( + request_id = %request_id, + operation_id = %uuid, + block_id, + "block_id exceeds u32::MAX; skipping" + ); + } + } + }
| failed_block_ids: HashSet::new(), | ||
| }) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing whitespace to fix the failing pre-commit hook.
Line 127 has trailing whitespace, which is already failing CI.
🤖 Prompt for AI Agents
In `@lib/bindings/kvbm/src/block_manager/vllm/connector/worker.rs` at line 127,
Remove the trailing whitespace on the affected line in
lib/bindings/kvbm/src/block_manager/vllm/connector/worker.rs (the line flagged
by CI around the worker module) so the file has no extraneous spaces at
end-of-line; you can delete the trailing space directly or run a whitespace
cleanup (editor trim trailing whitespace or rustfmt/trim tool) and re-run the
pre-commit hook to verify the CI error is resolved.
Overview:
Add infrastructure to track and report block IDs that fail during asynchronous KV cache load operations in the KVBM connector. This enables vLLM to identify failed blocks and take corrective action (e.g., mark for recomputation).
Details:
block_idsfield toWorkerTransferRequestprotocol for tracking which blocks belong to each transfer operationfailed_block_idsfield andget_block_ids_with_load_errors()method toKvConnectorWorkerconnector_worker.py,dynamo_connector.py)failure_tx/failure_rxchannel betweenSchedulerandWorkerSchedulerClientto propagate transfer failuresdrain_failures()method to collect pending failure notificationsrequest_id→uuid→block_idsmapping in worker to convert failure notifications to block IDshandle_immediate_result()to send failure notifications whenresult.status.is_err()Where should the reviewer start?
lib/llm/src/block_manager/connector/scheduler.rs- Core failure notification channel implementationlib/bindings/kvbm/src/block_manager/vllm/connector/worker.rs- Block ID tracking and error aggregation logiclib/llm/src/block_manager/connector/protocol.rs- Protocol changes (block_idsfield)Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.